Skip to content

AI: streaming-safe middleware, agent-driven Runner, ModelBuilder tool binding#139

Open
bedus-creation wants to merge 16 commits into
mainfrom
task/langgraph-agent-harness
Open

AI: streaming-safe middleware, agent-driven Runner, ModelBuilder tool binding#139
bedus-creation wants to merge 16 commits into
mainfrom
task/langgraph-agent-harness

Conversation

@bedus-creation

@bedus-creation bedus-creation commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Hardens the AI agent harness: middleware no longer breaks streaming, tool binding/execution have one clear owner each, and the Runner is driven by the agent. Tests are realigned to the current API.

Streaming through middleware (the main fix)

With middleware attached, .stream() used to buffer the whole response. A middleware written the natural way — final = await handler(model) — drains the entire Response stream before returning, so:

  • the first token only appeared after the full generation (seconds of latency instead of sub-second first-token), and
  • after-hooks (e.g. response logging) fired only at the very end.

Fix: build_pipeline now hands every middleware a Response-returning handler, so a layer can attach .then(callback) and return it without awaiting. Awaiting still works for prompt() (buffered), but streaming stays token-by-token and the after-hook fires exactly once on completion (post-stream for stream(), on the final message for prompt()).

  • Middleware may be sync or async — both supported. The canonical form is now sync:
    def handle(self, model, handler):
        ...  # before
        return handler(model).then(after_fn)   # don't await — streaming-safe
  • build_pipeline checks isinstance(_, Response) before the awaitable check, since Response is itself awaitable (otherwise a sync handle returning a Response would be re-buffered).
  • Example AgentLogger converted to sync + .then().

Tool binding / execution ownership

  • ModelBuilder binds agent.tools() onto the chat model (what the LLM needs to emit tool calls).
  • Runner no longer binds — it keeps the name → BaseTool map purely to execute returned tool calls. bind_tools stores serialized schemas, not callables, so execution still needs the real tools.

Runner takes the agent

  • Runner(agent, model) / StreamRunner(agent, model) instead of threading tools/max_steps through every call site. model stays a separate arg because the middleware pipeline can transform it.

Typing & API cleanup

  • Agent.tools() typed as list[BaseTool] (lazy TYPE_CHECKING import) — fixes the tool.name/dict[str, BaseTool] type errors.
  • @model decorator now sets model (it previously set an unread _model, making the decorator a silent no-op).

Tests

  • Realigned test_agent.py / test_agent_decorators.py to the current API: instructions() method, provider attribute, ModelBuilder._resolve_model; removed the deleted memory decorator.
  • Added regression tests: streaming-through-middleware streams token-by-token with a single after-hook, and the prompt after-hook fires.

Verification

  • pytest tests/ai/191 passed.
  • Full suite (excluding live-Postgres tests) → 1541 passed, 7 skipped.

🤖 Generated with Claude Code

…I intact

Swap the per-provider SDK internals of ai/agent.py (_run/_stream over the
anthropic/openai/google SDKs) for a single LangChain/LangGraph backend:
init_chat_model builds the chat model and create_agent drives the tool loop,
with the final AIMessage mapped back to AgentResponse. The user-facing surface
is unchanged — prompt/stream/fake/assert_prompted/assert_not_prompted/reset,
the lifecycle hooks, and the decorators keep identical signatures.

- _build_model() is the seam tests patch to inject a fake chat model.
- _build_messages() now renders attachments via Document.to_langchain_block().
- Add Document.to_langchain_block(): inline text, base64 image/file blocks.
- Add ai/fakes.py fake_chat_model(): replays scripted AIMessage turns through a
  GenericFakeChatModel (bind_tools no-op) so the real create_agent loop runs
  offline; exported from the ai package root.
- New optional [langgraph] extra (langchain + langchain-core + langgraph).

The 23 tests in tests/ai/test_agent_fake.py stay green and unmodified (fake()
short-circuits before the backend). Adds tests/ai/test_agent_langgraph_backend.py
exercising the real loop offline: simple reply, full tool-calling loop, usage
mapping, attachment blocks, provider mapping, and streaming.
@bedus-creation bedus-creation force-pushed the task/langgraph-agent-harness branch from beedb37 to 990e842 Compare June 23, 2026 07:01
@bedus-creation bedus-creation changed the title LangGraph agent test harness: Agent.fake() + Agent.record() Back the Agent with LangChain/LangGraph (identical public API) Jun 23, 2026
…ent through it

Turn ai/config.py into a config package and resolve models/providers through a
new Lab helper instead of hardcoded dicts on the Agent.

- ai/config/: split provider dataclasses (config.py) from the top-level AIConfig
  (ai.py), add config/__init__.py re-exporting them, and give each provider a
  models map keyed by modality (default / default_image / default_audio /
  default_transcribe). Fix the draft's circular import (AIConfig imported the
  provider configs from the package root mid-init) and the placeholder model
  values (google text default, elevenlabs models).
- AIConfig selects the default provider per modality: default (text),
  default_image, default_audio, default_transcribe. image.py/audio.py now read
  default_image/default_audio (was image_provider/audio_provider).
- ai/lab.py: Lab(StrEnum) + ModelType resolve the provider, default model, and
  the "<langchain-provider>:<model>" URL from Config (google → google_genai).
- Agent: _resolve_model() and _build_model() now go through Lab; removed the
  stale _DEFAULT_MODELS/_LANGCHAIN_PROVIDERS references and the dead
  _execute_tool() (create_agent runs tools itself).
- Tests: drop the _build_model monkeypatch helper; the backend tests now patch
  the real langchain.chat_models.init_chat_model seam via pytest monkeypatch.
  Add test_lab.py; update image/audio provider-selection mocks.

The 23 tests in tests/ai/test_agent_fake.py stay green and unmodified.
@bedus-creation bedus-creation changed the title Back the Agent with LangChain/LangGraph (identical public API) AI: LangGraph-backed Agent + config package & Lab resolver Jun 23, 2026
…l loop

Replace the LangGraph create_agent backend with a plain init_chat_model call
driven by a Runner that resolves and executes tool calls itself.

- runner.py: Runner(model, tools, max_steps) binds tools, invokes the model,
  executes requested tool calls, feeds results back, loops to a final answer;
  StreamRunner yields content tokens through the same loop. Fully typed.
- Agent._run/_stream delegate to Runner/StreamRunner (threading _max_steps);
  no create_agent.
- System message is declarative via instructions()/_instructions — removed the
  per-call system= and messages= arguments from prompt()/stream().
- Resolve provider/model through Lab directly (dropped the _lab() helper) and
  import Lab at module top.
- Config split into ai/config/{ai,config}.py.

KNOWN RED: ai/config/__init__.py is absent, so 'from fastapi_startkit.ai.config
import AIConfig' fails — the AI test suite does not collect and AIProvider import
breaks. Backend tests also still assume the old create_agent result shape and the
tuple return of _build_messages. Follow-ups.
@bedus-creation bedus-creation changed the title AI: LangGraph-backed Agent + config package & Lab resolver AI: init_chat_model backend + Runner tool loop, config & Lab Jun 23, 2026
@bedus-creation bedus-creation changed the title AI: init_chat_model backend + Runner tool loop, config & Lab AI: init_chat_model backend, Runner tool loop, and fake()/record() testing harness Jun 23, 2026
@bedus-creation bedus-creation force-pushed the task/langgraph-agent-harness branch from 0278c05 to e31432b Compare June 23, 2026 20:49
Add a class-level testing harness so an agent can be faked or recorded
without a real model provider:

- Agent.fake({...}) and Agent.record(path) return an AgentBinding usable
  as a context manager or test decorator. The binding swaps a stand-in
  into the service container under the agent's class name and auto-resets
  on exit, so even a controller's own ChatAgent().prompt(...) is covered.
- FakeAgent answers from glob patterns; RecordingAgent records the real
  reply to JSON once, then replays it. Both expose assert_prompted().
- Agent.make()/faked() resolve the bound stand-in for assertions.
- prompt()/stream() delegate to an active binding; the in-process
  agent.fake({...}) instance API is preserved via a dual-purpose accessor.
- Lab.ModelType carries the models-map key as a static mapping.
- Import AIConfig from the fastapi_startkit.ai namespace in tests and the
  AI facade stub, matching the provider registration.

Tests: full suite 1541 passed, 7 skipped; ruff clean.
@bedus-creation bedus-creation force-pushed the task/langgraph-agent-harness branch from e31432b to dbfc847 Compare June 23, 2026 20:57
Convert bare assert statements to self.assertEqual in the example/agents
feature tests, matching the unittest.IsolatedAsyncioTestCase base. Keep the
async test methods and the @ChatAgent.fake / @ChatAgent.record decorators
intact.
bedus-creation and others added 10 commits June 23, 2026 23:45
…sertions

test(agents): use unittest assertions in example feature tests
…Lab default fallback

- Replace the FakeAccessor descriptor with a single Agent.fake() classmethod;
  drop the unused faked()/bound aliases and rename the internal stand-in
  resolver to _faked(), sharing container lookup via _binding().
- Runner.run() now returns the tool result directly instead of looping the
  output back to the model (custom single-shot tool semantics).
- Resolve provider via Lab.get_provider(self._provider) so a None provider
  falls back to the configured default instead of raising.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Convert the AI tests from pytest function style to unittest.TestCase classes
(fake/record, decorators, lab, config, provider, document, response, agent).
Rename test_agent_langgraph_backend.py to test_agent.py, add a TestAgentRecord
class for record-and-replay, and align expectations with the actual config
default provider (google).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ckend

AIProvider.register() now resolves AIConfig and merges it into the config
store (merge_config_from) instead of binding into the container and setting
it in boot(); boot() is a no-op. Drop the unused _memory_backend class
attribute on Agent. Update provider tests to the new behaviour.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Agent.prompt() and stream() are now coroutines/async generators, matching
the framework's async-first design. The Runner uses ainvoke/astream/ainvoke
for tools, the fake/record stand-ins and AgentSnapshot.resolve are async, and
the fake/record/agent tests run under IsolatedAsyncioTestCase.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the bundled provider SDKs (anthropic, openai, google-generativeai) and the
unused langgraph from the [ai] extra and dev group — providers are pulled lazily
by init_chat_model and are now opt-in. Fix stale langgraph references in the
fake-model helper to point at the [ai] extra.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n ModelBuilder

Streaming previously buffered the entire response when an agent had
middleware: `final = await handler(model)` drained the Response stream
before returning, so the first token only appeared after the full
generation and after-hooks fired late. build_pipeline now hands each
middleware a Response-returning handler so layers can attach
`.then(callback)` and return without awaiting — streaming-safe, and the
after-hook fires once on completion (buffered for prompt, post-stream for
stream). Middleware may be sync or async; the example AgentLogger is now
sync + `.then()`.

Other AI changes:
- ModelBuilder binds tools (agent.tools()) onto the chat model; Runner no
  longer binds, only keeps the tool map for execution.
- Runner takes the agent (Runner(agent, model)) instead of threading
  tools/max_steps separately.
- Agent.tools() typed as list[BaseTool]; @model decorator sets `model`
  (was the unread `_model`).
- Tests updated to the current API (instructions() method, provider attr,
  ModelBuilder._resolve_model) and drop the removed `memory` decorator;
  add regression tests for streaming-through-middleware and the prompt
  after-hook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bedus-creation bedus-creation changed the title AI: init_chat_model backend, Runner tool loop, and fake()/record() testing harness AI: streaming-safe middleware, agent-driven Runner, ModelBuilder tool binding Jun 25, 2026
Add direct tests for ai/pipeline.py — the Response deferred-callback
mechanism and build_pipeline onion that back agent middleware. Locks down
that the .then after-hook fires exactly once on both the buffered (await)
and streaming (async for) consumption paths, modelled on the AgentLogger
request/response logging pattern. Brings pipeline.py to 100% coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant